Skip to content

Conversation

@Felixoid
Copy link

@Felixoid Felixoid commented Dec 18, 2025

This is an attempt to fix #35.

The ENV variable SCCACHE_BASEDIRS and configuration parameter basedirs are added.

As well as new tests to validate the behavior.

@codecov-commenter
Copy link

codecov-commenter commented Dec 18, 2025

Codecov Report

❌ Patch coverage is 98.60896% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 71.51%. Comparing base (cd7dcd5) to head (9eb3241).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/cache/cache.rs 68.42% 6 Missing ⚠️
tests/oauth.rs 0.00% 2 Missing ⚠️
src/util.rs 99.21% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2521      +/-   ##
==========================================
+ Coverage   71.04%   71.51%   +0.47%     
==========================================
  Files          64       64              
  Lines       35369    35991     +622     
==========================================
+ Hits        25128    25740     +612     
- Misses      10241    10251      +10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Felixoid Felixoid force-pushed the add-basedir-configuration branch from bfba6ec to 9eb3241 Compare December 18, 2025 23:45
@Felixoid Felixoid force-pushed the add-basedir-configuration branch from e818064 to d2e6edd Compare December 22, 2025 10:47
.iter()
.map(|basedir| {
let normalized = basedir.to_string_lossy();
let trimmed = normalized.trim_end_matches('/').trim_end_matches('\\');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the c:/test and c:\test both recognize as a prefix if only c:/test was passed as a prefix one?

Windows build may mix slashes in the paths

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, and I'm not sure how it should be addressed.

In any case, it's possible to put both cases info the basedirs parameter. Or you rather mean, that both slashes should be considered as the directory separator?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, both slashes may be a directory separator.

For this example we have only one slash in the path, but if we have a c:/very/long/base/directory/path/with/slashes, we need to pass 2^7=128 variants, that is not good

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sad story, never knew MS started supporting forward slashes as the directory separators. And the case where they're mixed sounds cursed to me.

But the point is valid; normalization should be done automatically.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Windows and macOS have always had case folding anyways. There are also now per-directory flags for Linux to be case-insensitive (e.g., ext4), but has also always had the issue (e.g., FAT). I believe inode comparisons can also fail with subvolume mounts.

Anyways, case and separator confusion are the only issues that I think need to be considered here. Actually using case-insensitive or confusable subvolume file system setups for sccache on Linux use cases seems cursed.

Copy link
Author

@Felixoid Felixoid Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer a case-sensitive approach for Unix OSs and a case-insensitive approach for Windows. The macOS behavior was one of the most confusing I have ever seen, with inconsistent case sensitivity. So, I partially reverted the latest commit with global case insensitivity.

MacOS FS behavior

As far as I remember, a several-year-old experiment did show the following apple's FS behavior:

  • If there was only a file, ls A would show it
  • If there were a and A files, ls A would show the correct A file
  • touch a and touch A would create files properly

I think the path should match exactly in this case, without complex logic for searching files.

I can't argue with the Win approach; I don't use it anywhere, and I've forgotten every edge case it could have.

Can I implement a case-sensitive match on Unix and a case-insensitive match on Windows, so that the latter can be improved later if necessary?

I added trace! logging, which showed that not everything was covered at the moment. It should be better now.

BTW, thanks a lot for a fast-paced review! 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

macOS does Unicode normalization too (NFC IIRC, but it doesn't really matter). macOS is indeed weird and is in a case-sensitive transition of some kind, but I lack an Apple fortuneteller to know any plans or schedules.

macOS is far better at case preservation (i.e., if you refer to a as A, most everything will remember that). Windows tools will do what they do and internal normalization is common (slash flipping, all-upper, all-lower, 8.3 transforms, UNC syntax, you name it). I'm fine with the Windows is case insensitive and everything else is case sensitive plan as mixups mostly happen on Windows. Having the ability to get a log of issues will definitely help people use consistent cases where they need to.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I remember, a several-year-old experiment did show the following apple's FS behavior:

  • If there was only a file, ls A would show it

  • If there were a and A files, ls A would show the correct A file

  • touch a and touch A would create files properly

Depends on your partition volume settings. By default, a stock MacOS installation will have a case insensitive volume, so touch a and touch A will refer to the same file. As for the POSIX API, when you obtain a file descriptor, the file descriptor will remember exactly the path on how you obtained it (AFAIK, it does not get normalised), but when the actual file accesses translate through the kernel, the case will be lost. The FS, however, remembers the "first" casing of the file on how you created it.

What's interesting is that despite the underlying FS being case insensitive, the Shell itself retains case sensitivity when globbing… (This might be tuneable depending on the shell!)

See:

echo "hello world" > hello.txt
❯ echo "HELLO WORLD" > HELLO.txt
❯ ls -alh *.txt
Permissions Size User       Date Modified Name
.rw-r--r--    12 whisperity 24 Dec 12:01  hello.txt
❯ cat hello.txt; cat HELLO.txt
HELLO WORLD
HELLO WORLD

❯ echo "goodbye" > BYE.TXT
❯ echo "GOODBYE" > bye.txt
❯ ls -alh *.txt
Permissions Size User       Date Modified Name
.rw-r--r--    12 whisperity 24 Dec 12:01  hello.txt
❯ ls -alh *.TXT
Permissions Size User       Date Modified Name
.rw-r--r--     8 whisperity 24 Dec 12:02  BYE.TXT
❯ cat bye.txt BYE.TXT
GOODBYE
GOODBYE

Many tools, such as Git or Subversion can run into issues when in the repository people change files' cases or commit multiple files that only differ in a case. Recent versions of Git handle this gracefully, notifying you that the checkout failed because the case-insensitivity overwrote an existing file with a different content. Subversion is much worse in this regard, it just shows a content conflict. All these are resolvable issues (if you have commit rights), but takes some VCS-specific maintainer spelunking to untangle.

You can create a case-sensitive volume, and mount it somewhere, at which point everything will behave as if you were on Linux. 🙂 In fact, if you ever reach a point that you need to self-compile GCC on Mac, this is something you will have to suffer.

cd /Volumes/Case-Sensitive      # Just a random volume I have created and mounted long ago thanks to GCC…echo "hello world" > hello.txt
❯ echo "HELLO WORLD" > HELLO.txt
❯ echo "goodbye" > BYE.TXT
❯ echo "GOODBYE" > bye.txt
❯ cat *.txt
HELLO WORLD
GOODBYE
hello world
❯ cat *.TXT
goodbye
❯ ls -alhi
 inode Permissions Size User       Date Modified Name
937334 .rw-r--r--     8 whisperity 24 Dec 12:06  bye.txt
937333 .rw-r--r--     8 whisperity 24 Dec 12:06  BYE.TXT
937335 .rw-r--r--    12 whisperity 24 Dec 12:07  HELLO.txt
937332 .rw-r--r--    12 whisperity 24 Dec 12:06  hello.txt

MacOS 15.5

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, a higher-level question. Is there no Rust library that could handle these OS-specifics for paths instead of us having to juggle around strings in a potentially clunky or edge-sharp way? In Python, you'd have pathlib standard library giving you the platform-specific right behaviour. On GNU Linux, there's also the realpath and readlink (GNU realpath can even do --relative-to!) binaries and POSIX realpath() call which do normalisations and such.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there no Rust library that could handle these OS-specifics for paths instead of us having to juggle around strings in a potentially clunky or edge-sharp way?

The same issue as described nearby. The preprocessor_output is not a path; it's a long plain text. To cut a path from there to check a prefix/relative path, we need to identify its boundaries.

While thinking about it, I realized the function is suboptimal anyway. Instead of iterating byte by byte, it should search for every basedir, and then take the lowest found position.

Back to your point, how do you think it could be done to normalize paths in the preprocessed_output?

@Felixoid
Copy link
Author

I got an idea, that basedirs should be added to the stats command as well

@Felixoid Felixoid force-pushed the add-basedir-configuration branch from ca0a0db to 6f47f36 Compare December 23, 2025 13:57
@Felixoid Felixoid changed the title Add SCCACHE_BASEDIR support Add SCCACHE_BASEDIRS support Dec 23, 2025
@Felixoid Felixoid force-pushed the add-basedir-configuration branch from b5a7d22 to cf0b871 Compare December 23, 2025 23:03
@AJIOB
Copy link
Contributor

AJIOB commented Dec 24, 2025

I got an idea, that basedirs should be added to the stats command as well

I think we can also provide the stat about base dir usage translation:

  • Number of base dir applied requests
  • Number of base dir skipped requests

In this case we can see, do we need to provide more/better base dirs or not

Comment on lines +9 to +20
# Base directory (or directories) to strip from paths for cache key computation.
# Similar to ccache's CCACHE_BASEDIR. This enables cache hits across
# different absolute paths when compiling the same source code.
# Can be an array of paths. When multiple paths are provided,
# the longest matching prefix is used.
# Path matching is case-insensitive on Windows and case-sensitive on other OSes.
# For example, if basedir is "/home/user/project", then paths like
# "/home/user/project/src/main.c" will be normalized to "./src/main.c"
# for caching purposes.
basedirs = ["/home/user/project"]
# Or multiple directories:
# basedirs = ["/home/user/project", "/home/user/workspace"]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bit of a rephrasing suggestion:

Suggested change
# Base directory (or directories) to strip from paths for cache key computation.
# Similar to ccache's CCACHE_BASEDIR. This enables cache hits across
# different absolute paths when compiling the same source code.
# Can be an array of paths. When multiple paths are provided,
# the longest matching prefix is used.
# Path matching is case-insensitive on Windows and case-sensitive on other OSes.
# For example, if basedir is "/home/user/project", then paths like
# "/home/user/project/src/main.c" will be normalized to "./src/main.c"
# for caching purposes.
basedirs = ["/home/user/project"]
# Or multiple directories:
# basedirs = ["/home/user/project", "/home/user/workspace"]
# Base directories to strip from source paths during cache key
# computation.
#
# Similar to ccache's CCACHE_BASEDIR, but supports multiple paths.
#
# 'basedirs' enables cache hits across different absolute root
# paths when compiling the same source code, such as between
# parallel checkouts of the same project, Git worktrees, or different
# users in a shared environment.
# When multiple paths are provided, the longest matching prefix
# is applied.
#
# Path matching is case-insensitive on Windows and case-sensitive on other OSes.
#
# Example:
# basedir = ["/home/user/project"] results in the path prefix rewrite:
# "/home/user/project/src/main.c" -> "./src/main.c"
basedirs = ["/home/user/project"]
# basedirs = ["/home/user/project", "/home/user/workspace"]

I'd not say "can be an array of paths" if the "one path" example already is an array. What happens if you do basedirs = "/home/foo/bar"? (I hope it's not the dreaded Python-like behaviour that it starts chewing away an array of characters one by one…)


* `SCCACHE_ALLOW_CORE_DUMPS` to enable core dumps by the server
* `SCCACHE_CONF` configuration file path
* `SCCACHE_BASEDIRS` base directory (or directories) to strip from paths for cache key computation. This is similar to ccache's `CCACHE_BASEDIR` and enables cache hits across different absolute paths when compiling the same source code. Multiple directories can be separated by `|` (pipe character). When multiple directories are specified, the longest matching prefix is used. Path matching is **case-insensitive** on Windows and **case-sensitive** on other operating systems. Environment variable takes precedence over file configuration. Only absolute paths are supported; relative paths will be ignored with a warning.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually when you provide multiple directories or file paths in environment variables (such as PATH, LD_LIBRARY_PATH, LD_PRELOAD, etc.), the convention is to use ; as a separator. Why was | chosen here instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Linux way is to use : as a PATH delimiter, but the Windows uses : for drive letter providing

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Linux way is to use : as a PATH delimiter

Ah, yes. Of course I mixed it up with CMake, which generally uses ; as an array separator…

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chosen | because any of , and : could be a part of the valid path. Didn't think of other options, but testing it blown my mind:

> ls -ld test*dir
drwxr-xr-x 2 felixoid felixoid 6 Dec 24 14:13  test:dir
drwxr-xr-x 2 felixoid felixoid 6 Dec 24 14:12 'test;dir'
drwxr-xr-x 2 felixoid felixoid 6 Dec 24 14:13 'test|dir'

The colon doesn't look like an option to me because of the c:/ format. Coma is too familiar.

I am open for suggestions. Maybe, ; looks like a decent compromise.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use ; on Windows and : elsewhere. There's no "safe" character on POSIX as any non-NUL, non-/ byte is valid in filenames and everything uses : for path list separators. If :-escaping is needed, someone will come and make noise about it. It can still be considered whether it is a valid use case at that time.

Comment on lines +752 to +755
warn!(
"Ignoring relative basedir path: {:?}. Only absolute paths are supported.",
p
);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a relative basedir is an invalid configuration, wouldn't it make more sense to error here and get the dev/server admin to fix their config?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By "error," you mean "fail to start"? It could be an option as well, I didn't think of it as a misconfiguration.

Looks OK to me to hard fail here.

Comment on lines +587 to +589
/// Base directory (or directories) to strip from paths for cache key computation.
/// Can be a single path or an array of paths.
pub basedirs: Vec<PathBuf>,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Base directory (or directories) to strip from paths for cache key computation.
/// Can be a single path or an array of paths.
pub basedirs: Vec<PathBuf>,
/// Base directories to strip from paths for cache key computation.
pub basedirs: Vec<PathBuf>,

Vec already captures the notion that it can be multiple elements.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, it's a leftover from the first implementation.

.iter()
.map(|basedir| {
let normalized = basedir.to_string_lossy();
let trimmed = normalized.trim_end_matches('/').trim_end_matches('\\');

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, a higher-level question. Is there no Rust library that could handle these OS-specifics for paths instead of us having to juggle around strings in a potentially clunky or edge-sharp way? In Python, you'd have pathlib standard library giving you the platform-specific right behaviour. On GNU Linux, there's also the realpath and readlink (GNU realpath can even do --relative-to!) binaries and POSIX realpath() call which do normalisations and such.

Comment on lines +1083 to +1087
// Check if this is actually a path boundary (preceded by whitespace, quote, or start)
let is_boundary = i == 0
|| preprocessor_output[i - 1].is_ascii_whitespace()
|| preprocessor_output[i - 1] == b'"'
|| preprocessor_output[i - 1] == b'<';

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be ultra cursed even beyond the case sensitivity issue here, but on Linux and MacOS, whitespace and " are perfectly legit members of a path element's name, as long as they are escaped.

/home/Whisperity/Projects/The\ LLVM\ Compiler\ Infrastructure\ Project/.Worktrees/Research:\ \"The Big Lebowski\" is a legit path to have as a basedir. Granted, you can simplify all these escapes if you wrap things in quotes, but there is a countably infinite depth of escape hell you can get into here.

❯ gstat "./Projects/The LLVM Compiler Infrastructure Project/.Worktrees/Research: \"The Big Lebowski\""
  File: ./Projects/The LLVM Compiler Infrastructure Project/.Worktrees/Research: "The Big Lebowski"
  Size: 64              Blocks: 0          IO Block: 4096   directory
Device: 1,17    Inode: 117446528   Links: 2
Access: (0755/drwxr-xr-x)  Uid: (  501/whisperity)   Gid: (   20/   staff)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, the quotes are rare case, so I can suggest to add it's support additionally

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/home/Whisperity/Projects/The\ LLVM\ Compiler\ Infrastructure\ Project/.Worktrees/Research:\ \"The Big Lebowski\"

I think, it's fine. The file will still be processed. The preprocessor_output[i - 1] == b'"' checks the opening border of the path.

To get stroke by some issue, I could imagine the following:

basedir = '/home/user/project' and preprocessor_output = '"/home/user2/test-dir/tragic\ mistake\ \"/home/user/project/real-dir/file.c'

But I'd consider it the system's current limit. The main issue here, preprocessor_output is the plain text. It can contain a lot of things =\

@Felixoid
Copy link
Author

  • Number of base dir applied requests
  • Number of base dir skipped requests

It's a tricky one. After taking a look, the number of successful substitutions is relatively easy to implement, although the counter should be threaded to the ServerStats.

But how to count the number of skipped directories? What is it? If a base_directory didn't match any of the output at all?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement an equivalent to CCACHE_BASEDIR

5 participants